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Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed 
world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize 
and cause disease in various hosts, are not well understood, and new approaches are required to 
understand its basic biology. High-throughput sequencing technologies provide unprecedented 
opportunities for functional genomic research. Recent studies have shown that direct lllumina 
sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative 
examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes 
of C. jejuni (NCTC1 1 1 68) and its rpoN mutant. This has allowed the identification of hitherto 
unknown transcriptional units, and further defines the regulon that is dependent on rpoN for 
expression. The analysis of the NCTC1 1 1 68 transcriptome was supplemented by additional 
proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets 
represent an important resource for the Campylobacter research community. 



INTRODUCTION 

Campylobacter jejuni is the most common bacterial cause of 
foodborne disease in the developed world, with an estimated 
one in every 100 individuals in the UK developing a 
Campylobacter-related illness each year, resulting in an 
economic burden of up to £500 million per annum 
(Humphrey et ah, 2007). Although most cases are self- 
limiting, C jejuni has been associated with severe post- 
infection complications, including polyneuropathies such as 
Guillain-Barre syndrome (Goodyear et ah, 1999). The 
bacterium is a common gut commensal of animals destined 
for human consumption, and faecal contamination of meat 
during processing is a recognized route of transmission to 
humans (Corry & Atabay, 2001; Herman et al, 2003). C. 
jejuni (NCTC11 168) was the first food-borne pathogen to be 



Abbreviations: CDS, coding sequence; LC, liquid chromatography; 
ncRNA, non-coding RNA; RNA-seq, transcriptome sequencing; RPKM, 
reads per kb of transcript per million mapped reads. 

The complete dataset for quantifying transcript levels in this study has 
been deposited in the ArrayExpress database with the accession no. 
E-MTAB-706. The complete protein sequencing dataset from this study 
has been deposited in the PRIDE database with the accession nos 
17661 and 17684. 

Five supplementary tables and a supplementary figure are available with 
the online version of this paper. 



sequenced (Parkhill et ah, 2000) due to its importance as a 
pathogen and the fact that it has a relatively small genome 
(1.6 Mb). Many other Campylobacter species have been 
sequenced, and these genome sequences have facilitated 
many functional genomics -based approaches to study the 
roles of individual genes and proteins in colonization, 
environmental survival and virulence. 

An important step in the control of bacterial gene 
expression is the initiation of transcription, which is 
dependent on the DNA-dependent RNA polymerase core 
enzyme binding a sigma factor (Polyakov et al, 1995). 
Unlike Escherichia coli, which has seven sigma factors, the 
genome sequence of C jejuni NCTC11168 revealed the 
presence of only three (Parkhill et al, 2000): RpoD (a 70 ), 
responsible for the transcription of most genes (Petersen 
et al, 2003), and RpoN (a 54 ) and RpoF/FliA (a 28 ), which 
control the expression of genes required for fkgellum 
biosynthesis and chemo taxis (Hendrixson et al, 2001; 
Jagannathan et al., 2001; Wassenaar et al, 1994). The lack 
of sigma factors in C jejuni suggests that alternative 
regulatory mechanisms must exist and/or that the two 
alternative sigma factors play wider regulatory roles than 
those characterized in other species. Although a 54 is known 
to be involved in the transcription of flagellar genes 
including those encoding the hook, basal body and minor 
flagellin (Carrillo et al, 2004; Hendrixson & DiRita, 2003; 
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Wosten et al, 2004), the complete RpoN regulon in C. jejuni 
has not been fully defined. 

C. jejuni remains a poorly understood pathogen and many 
of the paradigms that have been established with 'model' 
bacterial species such as E. coli and Salmonella enterica do 
not apply to it. Consequently, new approaches are required 
to understand this pathogen. Recently, several eukaryotes 
(Cloonan et al, 2008; Mortazavi et ah, 2008; Wilhelm et ah, 

2008) and prokaryotes (Perkins et al, 2009; Yoder-Himes 
et al, 2009) have been profiled at the transcriptome 
level using direct high-throughput Illumina sequencing of 
cDNA. This technique offers the ability to survey the entire 
transcriptome of an organism in a high-throughput and 
quantitative manner (Marioni et al, 2008; Wang et ah, 

2009) . Since a large number of sequencing reads can be 
readily obtained, the method is sensitive and offers a large 
dynamic range. Thus RNA-seq can be used to detect and 
quantify RNA expressed at very low levels, in contrast with 
what can be found using DNA microarrays (Nagalakshmi 
et al, 2008). 

In this study we have used Illumina high-throughput 
DNA sequencing to study mRNA expression levels (i.e. 
the transcriptome) of C. jejuni NCTC11168, and have 
compared the transcriptome data with protein expression 
data. In addition we have used RNA-seq to map and 
quantify the transcriptome of an rpoN mutant. We have 
compared RNA-seq with existing microarray technologies 
for their abilities to identify differentially expressed genes, 
and we have identified putative small non-coding RNAs 
(ncRNAs) that would not have been apparent using 
microarray technology. 

METHODS 

Bacteria. E. coli strain DH5a was grown at 37 °C on Luria-Bertani 
(LB) agar or in LB broth. Preparation of electrocompetent E. coli cells 
and transformations were performed as described previously (Dower 
et al, 1988). C. jejuni was routinely cultured on Mueller-Hinton 
(MH) agar (Oxoid) supplemented with 5% defibrinated horse 
blood (hereafter referred to as MH blood agar plates) at 42 °C under 
microaerobic conditions (5% 0 2 , 5% C0 2 , 5% H 2 , 85% N 2 ) in a 
MACS VA500 variable atmosphere workstation (Don Whitley 
Scientific). Liquid cultures of C. jejuni were grown in brain heart 
infusion (BHI) broth (Oxoid) at 42 °C under microaerobic condi- 
tions with agitation at 150 r.p.m. Where necessary for selection, 
media were supplemented with chloramphenicol (10 jig ml -1 ), 
ampicillin (100 jig ml -1 ) or trimethoprim (10 jig ml -1 ). 

Generation of C. jejuni NCTC11168 rpoNwcat mutant. The 

C. jejuni rpoN mutant was generated by insertional inactivation of 
the gene using a chloramphenicol acetyltransferase (cat) cassette. 
Standard methods were used for molecular cloning (Sambrook & 
Russell, 2001). Chromosomal and plasmid DNA purifications were 
performed using commercial kits following the manufacturers' 
instructions (Qiagen). Routine DNA modifications including restric- 
tion endonuclease digestion of DNA, modifications of DNA and 
ligations were carried out according to the manufacturers, instructions 
(Promega, Invitrogen, Roche, New England Bioloabs). PCR primers 
were designed using Primer3 (http://frodo.wi.mit.edu/primer3/) and 
purchased from Sigma. All PCRs were performed in 50 ul reaction 



volumes in 0.2 ml Eppendorf tubes in an Applied Biosystems 
GeneAmp PCR Systems 9700 (Applied Biosystems). Reactions 
contained 200 uM dNTPs, 2 mM Mg 2 + , 0.01 vols Proof Start DNA 
polymerase (Qiagen; 2.5 U ul -1 ), 0.1 vols polymerase buffer (10 x ), 
1 uM forward and reverse primers, and template DNA (-50 ng 
plasmid DNA or -100 ng chromosomal DNA). Typical thermal 
cycler conditions were 94 °C for 4 min, 30 cycles of 94 °C for 1 min, 
55 °C for 1 min and 72 °C for 1 min, followed by a final extension of 
72 °C for 7 min. DNA and RNA concentration and purity were 
measured using a Nanodrop ND-1000 spectrophotometer. The rpoN 
gene was amplified by PCR from C. jejuni NCTC11168 chromosomal 
DNA using primers ajg285 ( 5 ' -GGGGGGATCCATGTTAAAGCA- 
AAAAATCACC-3 ' ) and ajg286 ( 5 ' -GGGGGGATCCTTATCCTT- 
CAAGTTCATATAA-3'); the resulting fragment was cloned as a 
BamHl fragment into Ba mH I -digested pUC19 to generate the 
plasmid pAKl. To facilitate subsequent selections a chloramphenicol 
resistance gene with its promoter was amplified by PCR from 
pEnterprise2 (Hendrixson et al, 2001) using primers ajg289 (5'- 
GGATCCCTTAAGCTCGGCGGTGTTCCTTTCCAA-3 ' ) and ajg290 
( 5 ' -GGATCCCTTAAGCGCTTTAGTTCCTAAAGGG-3 ' ) and inserted 
as an AfUl fragment into A/Hl-digested pAKl (rpoN contained a unique 
AfUl restriction site), generating pAK2. pAK2, with a cat cassette in the 
forward orientation relative to rpoN, was introduced into C. jejuni by 
natural transformation using a plate biphasic method adapted from 
Van Vliet et al (1998). The structure of a representative isolate, with a 
chromosomally located rpoN: : cat insertion, was confirmed by PCR 
and Southern hybridization (data not shown). 

Motility assays. Motility assays were performed essentially as 
described by Silverman & Simon (1973). Briefly, a platinum wire 
was dipped into a single colony and used to stab MH motility 
medium containing 0.4 % agar. Motility was assessed by measuring 
colony diameter after incubation at 42 °C for 16 h. 

RNA preparation for RNA-seq. C. jejuni strains were cultured for 
48 h on MH blood agar plates with antibiotics added as appropriate. 
Bacterial lawns were harvested in 1 ml BHI broth, 50 ul of which was 
inoculated into 10 ml BHI broth in a 50 ml Falcon centrifuge tube 
with a loosened cap, and grown for 16 h at 42 °C under microaerobic 
conditions with agitation at 150 r.p.m. to generate a starter culture. 
The OD 600 of starter cultures was measured by using a 6305 UV/ 
Visible Spectrophotometer (Jenway). Starter cultures were diluted 
appropriately in BHI broth and used to inoculate 10 ml BHI broth 
containing 10 jig trimethoprim ml -1 in a 50 ml falcon tube 
(previously equilibrated at 42 °C in microaerobic conditions for 
16 h) to a calculated OD 600 of 0.00002. Cultures were grown at 42 °C 
under microaerobic conditions with shaking at 150 r.p.m. for 24 h. 
Growth was monitored by recording OD 60 o and by determining 
viable c.f.u. ml -1 by serial 10-fold dilution of cultures in BHI broth 
and plating onto MH agar. Plates were incubated microaerobically at 
42 °C for 48 h before colonies were counted. C. jejuni cultures were 
fixed with 2 vols RNA protect bacteria (Qiagen) and harvested. RNA 
was isolated from the pellet using the SV RNA isolation kit (Promega) 
according to the manufacturer's instructions. 23S and 16S rRNA was 
depleted using a MicrobExpress kit (Ambion). Genomic DNA was 
removed with two digestions, using amplification- grade DNase I 
(Invitrogen), to below levels detectable by PCR. RNA was reverse- 
transcribed using random primers (Invitrogen) and Superscript III 
(Invitrogen) at 45 °C for 3 h and denatured at 70 °C for 15 min. leuB 
(using primers QJAW084 5 ' -GCAAGTATAGATGCTTATGGAG- 
TG-3' and QJAW085 5 ' -CTCTTTCAGGTCTTTGATCTATGG-3 ' ) , 
aroA (QJAW092 5 ' -GCTTTGGCTAAGGGTAAATCTAGT-3 ' and 
QJAW093 5 ' -ATCAAGTTCTCTAGCTTCAACACC-3 ' ) , IpxD (QJAW 
104 5 ' -GGAGCTTATATAGGCGATAATGTC-3 ' and QJAW 105 5'- 
CAAAACCGTCACTTCCTATTACAC-3 ' ) and guaB (QJAW108 5'- 
GGGTGTTGATGTTGTTGTGC-3' and QJAW 109 5'-GGCGATAT- 
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TTCCTGCGATAA-3') were used as targets for a PCR as a positive 
control for reverse transcription. 

Library construction and sequencing. Library construction and 
sequencing were carried out as described previously (Perkins et aly 
2009). 

Read mapping and visualization. Reads were mapped to the 
C. jejuni NCTC11168 genome sequence (GenBank accession no. 
ALII 1168) using Novoalign version 2.05 (Novocraft Technologies; 
http://www.novocraft.com) with default settings. Reads that mapped 
to more than one position in the genome were randomly distributed 
amongst the matching regions. Mapped reads were visualized using 
the version of Bam View (Carver et al, 2010) integrated into Artemis 
(Rutherford et al, 2000). All subsequent analyses were performed 
using R version 2.8.0. 

Quantification of transcript levels. To estimate the level of 
transcription for each gene, the number of reads that mapped within 
each annotated coding sequence (CDS) was determined. To account 
for the possibility of unannotated ncRNA genes, the number of reads 
that mapped within each intergenic region of >150 bp was also 
determined. Reads that mapped within 50 bp of the adjacent genes 
were discarded to prevent the presence of 5' and 3' untranslated 
regions of adjacent transcripts from influencing the read count for the 
intergenic region. To enable comparison of expression levels, between 
both different RNA-seq experiments and different genes within the 
same experiment, it is necessary to normalize the read counts. The 
number of reads per kb of transcript per million mapped reads 
(RPKM) has been proposed as a useful metric that normalizes for 
variation in transcript length and sequence yield (Mortazavi et al> 
2008). RPKM values for each gene and intergenic region were 
calculated for the data from each Illumina lane, and mean RPKM 
values were determined for the wild-type and rpoN mutant samples. 
As RPKM values are log-normally distributed, it is convenient to 
express them as log 2 (RPKM + 1). The complete dataset from this 
study has been deposited in the ArrayExpress database (http://www. 
ebi.ac.uk/arrayexpress) with the accession no. E-MTAB-706. 

Analysis of differential expression. Differential expression was 
assessed using DESeq (Anders & Huber, 2010). The total read count 
was determined for each gene and intergenic region by combining 
data from the two technical replicate sequencing runs. Read counts 
for the wild-type and the rpoN mutant were compared to determine 
the log 2 -fold change in abundance of each transcript. DESeq allows a 
P-value to be determined in the absence of any available biological 
replicates, by treating the two conditions as replicates, under the 
assumption that only a small proportion of transcripts is differentially 
expressed. P-values were calculated under this assumption, and 
adjusted for multiple testing using the false discovery rate controlling 
procedure (Benjamini & Hochberg, 1995). 

Microarray analysis. Raw data from an earlier rpoN mutant 
microarray study (Kamal et al, 2007) were retrieved from 
BuG@Sbase and reanalysed by using Limma (Smyth, 2005). 
Background signals were subtracted from the foreground, and any 
resulting negative signals were assigned an arbitrary value of 0.5, to 
prevent problems arising from taking the log of a negative number. 
Within-array Loess normalization was performed for each array, and 
signals were scaled between arrays to have the same median-absolute- 
deviation (Smyth & Speed, 2003). Within-array replicate spot signals 
were averaged, and correlation of the signals for technical replicate 
arrays was assessed using the duplicate Correlation function (Smyth 
et aly 2005). Differential expression was assessed by using a linear 
modelling approach with empirical Bayes statistics (Smyth, 2004), 
and the resultant P- values were adjusted for multiple testing using the 



false discovery rate controlling procedure (Benjamini & Hochberg, 
1995). 

Cellular fractionation and protein sequencing. Bacteria were 
harvested from 5 ml culture (bacteria were grown as described above 
for RNA preparation for RNA-seq) by centrifugation in a micro- 
centrifuge (Hettich Mikro 22R) at 13 000 r.p.m. and 4 °C for 2 min. 
The supernatant was removed by aspiration, and the protein samples 
were resuspended in 2 x final sample buffer (FSB) [0.125 M Tris/HCl 
(pH 6.8), 4% SDS, 20% glycerol, 10% 2-mercaptoethanol, 0.05% 
Bromophenol Blue], reduced with DTT and alkylated with iodoace- 
tamide prior to separation in a 4-12% NuPAGE Bistris gel 
(Invitrogen). Gels were stained with colloidal Coomassie Blue 
(Sigma) and bands were excised and in-gel digested with trypsin 
(sequencing grade; Roche). The extracted peptides were analysed with 
online nano liquid chromatography (LC) -MS/MS on an Ultimate 
3000 Nano/Capillary LC System (Dionex) coupled to a Q-Tof 
Premier (Waters) or LTQ FT Ultra (ThermoElectron) mass 
spectrometer equipped with a nanoelectrospray source. Samples were 
first loaded and desalted on a PepMap CI 8 trap (0.3 or 0.5 mm id x 
5 mm, Dionex) then separated on a CI 8 analytical column [Atlantis 
100 um id x 22 cm (Waters), PepMap 75 um id x 25 cm (Dionex) 
or BEH 75 um id x 10 cm (Waters)] over a 45, 60 or 90 min linear 
gradient of 4-32% CH 3 CN/0.1% FA. The Q-Tof Premier mass 
spectrometer was operated in the standard data-dependent acquisi- 
tion (DDA) mode at a resolution of 10 000 (V mode) controlled by 
Masslynx 4.1. The survey scans were acquired over mlz 400-1500, and 
the four most abundant multiply charged ions (2 + , 3 + and 4 + ) with 
a minimal intensity at 20 or 30 counts s -1 were subject to MS/MS for 
3 s with dynamic exclusion for 120 s. The instrument was externally 
calibrated. The Raw files were processed by Proteinlynx Global Server 
(PLGS) 2.2.5 with and without deconvolution of MS/MS spectra by 
MaxEnt3. The LTQ FT Ultra mass spectrometer was operated in a 
similar DDA mode to the Q-Tof premier with a resolution at 100 000 
at mlz 400, as described previously (Pickard et a/., 2010). The raw files 
were processed with BioWorks 3.3 (ThermoElectron). The data were 
subjected to a database search with Mascot Server 2.2 (Matrix 
Science) against an in-house-built Campylobacter genomic six-frame 
translated database using the following parameters: trypsin/P with 
maximum three missed cleavage sites; peptide mass tolerance at 
+ 20 p.p.m. (FT) or 50 p.p.m. (Q-Tof Premier); MS/MS fragment 
mass tolerance at +0.49 Da (FT) or 0.2 Da (Q-Tof Premier); and 
variable modifications, Acetyl (Protein N-term), Carbamidomethyl 
(C), Deamidated (NQ), Dioxidation (M), Formyl (N-term), 
Gin— ►pyro-Glu (N-term Q), Methyl (E), Oxidation (M). The 
Mascot result was processed with the in-house Percolator (Brosch 
et aly 2009), and then filtered with the following rules: peptides with 
less than 7 amino acids or a posterior error probability >0.05 were 
removed. Proteins with two or more non-redundant peptides were 
kept. For those proteins with only one peptide identified, the peptide 
must be found in both datasets (Q-Tof Premier and FT) and with a 
PEP <0.01 and ^10 amino acids. If the peptide was found in only 
one of the datasets, then the peptide must also have a Mascot score 
above the identification threshold. This resulted in an overall false 
discovery rate <0.1 %. The complete dataset from this study has been 
deposited in the PRIDE database (http://www.ebi.ac.Uk/pride/q.do) 
with the accession nos 17661 and 17684. 

Secondary structure and conservation analysis for C. jejuni 
non-coding candidates. To identify potential unannotated ncRNA 
genes, large intergenic regions (>150 bp) from the complete genome 
sequence of C. jejuni NCTC1 1 168 were searched against RFAMSEQ (a 
subset of the EMBL nucleotide database) using a combination of wu- 
blast filters and covariance models (Altschul et aly 1990; Eddy & 
Durbin, 1994). Reliable matches were subsequently aligned to the 
reference sequence, and a consensus secondary structure was predi- 
cted using the WAR package (Torarinsson & Lindgreen, 2008); this 
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Fig. 1. RNA-seq sequence data mapped to 
the C. jejuni NCTC1 1 1 68 genome and 
visualized using Artemis and Bamview. The 
top section shows the position of reads 
derived from the wild-type (red) and reads 
from the rpoN : : cat mutant (green). The middle 
section shows a plot of sequence coverage, 
using the same colour scheme. The lower 
section shows a representation of the DNA 
strands and the six possible reading frames, 
and indicates the positions of annotated 
features. The highlighted gene, Cj1242, is 
downregulated in the rpoN mutant. This gene 
encodes the invasion antigen CiaC (Christensen 
et al, 2009). 



formed a 'seed' alignment. Covariance models were built from the 
resulting alignment and secondary structure; these were searched 
against RFAMSEQ10 using the Rfam pipeline (Eddy & Durbin, 1994; 
Gardner et al, 2009). New and reliable hits were added to the seed, 
and a new CM was built and researched. This procedure was iterated 
until there were no new reliable hits. The subsequent alignments were 
analysed using the structured RNA gene prediction methods RNAz 
(Washietl et al, 2005) and Alifoldz (Washietl & Hofacker, 2004). 
These were computed using a sliding window across alignments 
generated from the seed sequences using clustal w (Thompson et al., 
1994). The window size used was 100 columns and a step-size of 10 
columns for both methods. Each prediction method was run on both 
the forward and reverse complement strands. 



RESULTS AND DISCUSSION 



Mapping DNA sequence reads generated by 
lllumina-based RNA-seq to the annotated 
C. jejuni NCTC11168 genome - global 
interpretations 

In order to characterize the C. jejuni transcriptome using 
RNA-seq, RNA was prepared from a pool of five 
independent cultures of C. jejuni NCTC11168 grown for 
24 h to an OD 600 1.0 (log 10 10 c.f.u. ml" 1 ) in BHI broth. 
The 16S and 23S rRNA species were depleted prior to 
sequencing using selective capture and magnetic separa- 
tion. The resulting RNA was reverse-transcribed into 
cDNA which was then processed into a library of molecules 
that could be sequenced on the Illumina Genome Analyser. 
The procedure was repeated for RNA derived from a C. 
jejuni rpoN: : cat mutant. To assess technical variance, we 
sequenced each biological sample on two Illumina lanes. 
The sequence reads were mapped to the genome sequence 



of NCTC11168 and were displayed using the genome 
browser Artemis and Bamview (Fig. 1). The total number 
of reads obtained and mapped for each sample is detailed 
in Supplementary Table SI (available with the online 
version of this paper). As previously reported for RNA-seq 
analysis of eukaryotic (Wilhelm et al, 2008) and prokar- 
yotic (Perkins et al, 2009) RNA, the sequence coverage was 
not uniform across each CDS; however, our technical 
replicates were in close agreement (Supplementary Fig. SI), 
with the log 2 (RPKM+ 1) values for each gene and 
intergenic region giving Pearson correlation coefficients 
of 0.984 and 0.982 between replicates for the wild-type and 
rpoN::cat samples, respectively (P<2.2xl0~ 16 in both 
cases). Full details of the number of reads mapping to each 




logj(RPKM+1) 



Fig. 2. Density plot showing the distribution of log 2 (RPKM + 1 ) 
values obtained for the protein-coding genes in wild-type C. jejuni 
NCTC11168. 
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gene and intergenic region together with log 2 (RPKM+ 1) 
values are available in Supplementary Table S2. 

Protein-coding genes 

The distribution of log 2 (RPKM+ 1) values for the 1624 
protein-coding genes annotated in the C. jejuni NCTC1 1 168 



genome (Gundogdu et al, 2007) is shown in Fig. 2. The 
distribution is approximately normal, with a mean + sd of 
7.44 + 2.14. Only three protein-coding genes did not have 
any reads that mapped within their coding regions from 
either the wild-type or rpoN: : cat RNA-seq: Q0344, 
Cj0877c and Q0973. These genes are all annotated as 
encoding hypothetical proteins; Cj0877c is described as 
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Fig. 3. Artemis plots showing RNA-seq data 
obtained for pseudogenes (as for Fig. 1). (a) 
Cj0654c showed an expression pattern typical 
of most pseudogenes in the genome, with a 
high level of expression at the 5' end of the 
gene that diminishes towards the 3' end. (b) 
Cj1064 showed a high level of expression 
across the length of the pseudogene. 
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Very hypothetical'. Two more genes, annotated as 
encoding the hypothetical protein Cj0971 and the RepA 
homologue Cj 1667c, had no mapped reads in the wild- 
type sample but did have mapped reads derived from the 
rpoN: : cat sample. 

Pseudogenes 

The original annotation of C. jejuni NCTC11168 identified 
20 pseudogenes (Parkhill et al, 2000), and reanalysis of the 
genome revised this total to 19 (Gundogdu et al, 2007). We 
were able to detect reads for all of the pseudogenes 
(Supplementary Table S3, available with the online version 
of this paper) although only four exhibited a log 2 
(RPKM + 1) value above the first quartile of the distribution 
obtained from the protein-coding genes (Q0501, Q0565, 
Q1064 and Q1528). Closer inspection of the data indicates 
that most of the pseudogenes had a strong transcriptional 
profile at the start of the CDS with just a few isolated reads 
mapping to the remainder of the CDS (Fig. 3). Two of the 
pseudogenes (Cj 1064 and Q0501) had strong transcriptional 
profiles throughout the entire CDS, suggesting that tran- 
scription may be maintained to express functional domains 
or peptides, or that the inactivation of these genes was 
evolutionarily recent. The sequences of our reads matched 
the published genome sequences for these two genes. 

tRNA 

There are 43 tRNA-encoding genes annotated in the C. 
jejuni NCTC11168 genome (Gundogdu et al, 2007), with 
33 distinct anticodons. The log 2 (RPKM + 1) values of the 
tRNA genes show a similar mean to the protein-coding 
genes (7.67; Student's t-test, P= 0.3 188), but a reduced 



variance (Fisher's F test, P= 0.0040). It is a common 
observation that the use of alternative synonymous codons 
is non-random (Sharp et al, 2010); in C. jejuni this has 
been attributed to the low GC- content of the genome, with 
a preference for AT-rich synonyms (Fuglsang, 2003). It 
might be expected that the expression levels of the tRNAs 
would reflect the codon usage bias, but this does not seem 
to be the case (Supplementary Tables S4 and S5, available 
with the online version of this paper). For example, the 
tRNA gene with the highest log 2 (RPKM + 1) value (12.12) 
encodes tRNA-Arg with an anticodon CCU. However, the 
codon recognized by this tRNA, AGG, is used only 1340 
times in the genome, whereas the synonym AGA is used 
8195 times, yet its cognate tRNA with the anticodon UCU 
has the lowest log 2 (RPKM + 1) value of all the arginine 
tRNAs (7.88). Leucine is the most frequently used amino 
acid in the C. jejuni proteome, yet the four tRNA-Leu genes 
all have average log 2 (RPKM + 1) values. The levels of tRNA 
transcription seem to be dependent on genomic context, 
for example, the CCU anticodon tRNA-Arg is encoded 
downstream of the highly expressed fusA gene, and the 
three tRNA-Ile genes are located within rRNA operons. 

ncRNA sequences 

A limitation of expression microarrays is that they only 
provide information on specific genomic regions to which 
probe sequences have been designed. RNA-seq allows 
investigation of transcription across the whole genome, 
and hence enables the discovery of novel ncRNAs (Yoder- 
Himes et al, 2009). Four ncRNAs (excluding rRNA and 
tRNA) are currently annotated in the C. jejuni NCTC11168 
genome: the signal recognition particle RNA, the TPP 
riboswitch, the RNA component of RNaseP and lOSa RNA 



Table 1. Secondary structure and conservation analysis for C. jejuni ncRNA candidates 

ID, unique identifier containing genomic location; maxRnazP, the maximal RNAz probability on the forward or reverse strand - values greater than 
0.5 were classified as putative ncRNA genes; minAlifoldZ, the lowest (best) Z-score from the RNAalifoldZ predictions on the forward or reverse 
strand; numSeqs, no. of homologous sequences used in the Rfam seed alignment; aveLen, average length of the sequences in the Rfam seed 
alignment; GC content, average G + C content for all the sequences in the Rfam seed alignment; GC enrichment, enrichment of G + C nucleotides 
relative to the genomic background [computed as log 2 (GC-content/0.3055)]. One homologue was identified in the C. jejuni genome (EMBL 
ACC:AL111168.1) for each ncRNA candidate. 



ID no. 


MaxRnazP 


MinAlifoldZ 


numSeqs 


aveLen 


avelD 




GC 


Forward 


Reverse 


Forward 


Reverse 


Content 


Enrichment 


Intergenic_671549- 


0.957647 


0.796566 


-5.9 


-5.1 


8 


342.9 


74 


0.285 


-0.1001 


671895 




















Intergenic_722652- 


0.995352 


0.995194 


-4.6 


-4.4 


5 


89.4 


89 


0.295 


-0.0504 


722740 




















Intergenic_906748- 


0.932765 


0.89473 


-1.1 


-6.6 


5 


311 


89 


0.232 


-0.397 


907066 




















Intergenic_l 127982- 


0.999921 


0.559228 


-8.6 


-2.4 


5 


206.8 


78 


0.281 


-0.1205 


1128192 




















Intergenic_l 575021- 


0.999869 


0.999657 


-7.7 


-8.2 


5 


262.6 


88 


0.221 


-0.4671 


1575288 
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(Gundogdu et al, 2007). Transcripts were identified in the 
RNA-seq data for all of these. Additionally, we performed a 
screen of all large intergenic regions for potential unanno- 
tated ncRNA genes based on evolutionary conservation and 




Fig. 4. Consensus secondary structure predictions for the forward 
(left) and reverse (right) complement intergenic_671 549-671 895 
(a), intergenic_722652-722740 (b), intergenic_906748-907966 
(c), intergenicj 1 27982-1 1 281 92 (d) and intergenicj 575021 - 
1575288 (e). The colour markup indicates the sequence 
conservation. 



structural prediction using the Rfam alignment pipeline 
(Gardner et al, 2009), RNAz (Washietl et al, 2005) and 
Alifoldz (Washietl & Hofacker, 2004), and identified five 
candidate regions which may be worth investigating 
further (intergenic_671 549-671 895, intergenic_722652- 
722740, intergenic_906748-907066, intergenic_l 127982- 
1128192 and intergenic_l 57502 1-1 575288) (Table 1 and 
Fig. 4), although one of these (intergenic_906748-907066) 
did not show a significant level of expression in our data. 

RNA-seq and proteome analysis of C. jejuni 
hypothetical genes and pseudogenes 

mRNA transcript levels do not necessarily correlate with 
protein levels, or indeed whether the transcript is 
translated. We carried out a comprehensive proteomic 
analysis using LC-MS of peptides from whole-cell lysates of 
C. jejuni NCTC11168. We obtained good coverage with 
11462 peptides mapping to 1041 predicted genes. We did 
not obtain any peptides that mapped to any of the 
pseudogenes. This might suggest that, even though we 
obtained transcripts for a number of the pseudogenes, they 
are not translated. Alternatively, any translated products 
may be below the level of detection or below our ~6 kDa 
molecular mass cut-off. Fig. 5 shows the number of 
peptides obtained for each protein- coding gene plotted 
against the log 2 (RPKM + 1) value. Fitting a log-linear 
model with quasi-Poisson errors indicates a significant 
relationship between mRNA expression level and the 
observed peptide count (P<2 x 10 16 ). For the gene pnp 
(QT253), a peptide sequence was obtained that included 
one amino acid encoded upstream of the annotated start 



I4L1 
I2fl 



i0C 




0 2 4 fi a 10 12 14 



Fig. 5. Plot of the observed number of peptides from wild-type 
C. jejuni NCTC1 1 1 68 against log 2 (RPKM + 1 ) as a measure of the 
mRNA expression level. The blue line indicates the predicted 
protein level according to the fitted log-linear regression with 
quasi-Poisson errors. Red points indicate mRNAs predicted to 
encode peptides below the 6 kDa threshold of detection. 
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codon. This suggests that the annotated pnp start codon is 
incorrect; the most plausible alternative is the TTG codon 
at position 1183537. 

RNA-seq and microarray analysis of a C. jejuni 
rpoN mutant 

We generated a C. jejuni rpoN: : cat mutant and compared 
the growth of the mutant with the wild-type. Bacteria were 
cultured in MH broth with shaking in microaerobic 
conditions at 42 °C. The doubling time for each strain 
was determined for exponential phase growth: wild-type, 
1.40 + 0.12 generations per hour; mutant 1.41+0.01 
generations per hour. The difference between the growth 
rates was not statistically significant (Student's f-test, 
P>0.05) based on viable counts. Motility assays were 
performed on the wild- type and rpoN: : cat mutant. 



Following 72 h incubation at 42 °C in microaerobic 
conditions, the mutant was completely non-motile (data 
not shown). Genes that were expressed in the rpoN: : cat 
mutant were identified from the RNA-seq data using 
DEseq (Anders & Huber, 2010). The log 2 (fold changes) 
and estimated P- values obtained from the data are available 
in Supplementary Table S2. Twenty- five genes and two 
intergenic regions showed significantly altered expression 
in the rpoNr.cat mutant compared with the wild-type 
(P<0.1; see Table 2). Seventeen protein-coding genes 
showed significant downregulation in the rpoN mutant, 
including 12 that are annotated as encoding flagellar 
proteins and two others (Q0040 and Q1465) that are co- 
located with flagellar gene clusters. The other three 
downregulated genes (Cj0243c, Q1242 and Q1650) are all 
annotated as encoding hypothetical proteins, although 
Cjl242 has recently been identified as the invasion antigen 



Table 2. RNA-seq data for genes and intergenic regions identified as differentially expressed in the rpoN : : cat mutant relative to the 
wild-type 

The log 2 (fold change) (logFC) values for the RNA-seq and microarray, and the wild-type log 2 (RPKM + 1) values are coloured by value (red 
>yellow >blue). RNaseq and microarray P- values of <0.05 are in bold type. 



Locus 


Gene 


Product 


Logj{RPKM+l) 


RNAseq 


Microarray 








wt 




logFf: 


Adj. Fval 


logFC 


Adj. P val 




Cj0040 


Hypothetical protein Cj0O40 


9.29006 


1.114211 




6.41E-13 


-1.20475 


0.342079 


Cj 1729c 




Flagellar hook subunit protein 


9.314298 


2,259644 


-7.16717 


6.68E-20 


-4.24485 


0.000476 


CJ0042 




Putative flagellar hook assembly protein 


8.340937 


1,848599 


-6.35469 


3.5 IE- 18 


-3,17108 


0.080625 


Cj0043 




Flagellar hook protein 


7,807235 


2.317055 


-5.39872 


3.35E^09 


-5.40792 


1.08E-05 


Cj0041 




Putative flagellar hook-length control protein 


7.648091 


2.260054 


-5,32904 


1.85E-09 


-4.87652 


5.90E-05 


Cjl462 




Flagellar P-ring protein 


7.806462 


2.298871 


-5.22168 


1.10E-17 


-5.16816 


2.85E-05 


Cjl463 


M 


Hypothetical protein Q1463 


8.1 33356 


2,732458 


-5,19988 


8.87E-09 


-3,80549 


L06E-05 


Cjl242 


C)1242 


Hypothetical protein CJ1242 




5.563116 


-5.09235 


9.02E-22 


-4.11678 


2.03E-05 


qOA87t 


m 


Putative flagellar L-ring protein precursor 


8.122396 


3.144537 


-4.7493 


ti.8M-.-0K 


-3.64669 


2.03E-05 


CJ0697 




Flagellar basal-body rod protein 


8.358318 


4.108808 


-3.93854 


1.50E-4>5 


-3.95838 


9.75E-06 


Cjl466 


m 


Putative flagellar hook-associated protein 


7.370526 


3.103589 


-3.83308 


2.68E 05 


-2.24856 


9.78E-06 




Intergenic. 


731983-7321 28 


5.473063 


2,061626 


-3.53136 


0.044624 






Cj0243c 


Cj0243c 


Hypothetical protein Q0243c 


5 ,1 99924 


1.776923 


-3.42475 


0.000925 


-0.78113 


0.023892 


CJ1465 


Cjl465 


Hypothetical protein Cj 1465 


7.41.4707 


4,014025 


-3,12119 


0.003354 


-1,81513 


3.07E-05 


Cj0887c 


jlaD 


Putative ikgcllin 


7.204515 


4.118837 


-2.7323 


0.028751 




9.78E-06 


qo698 




Flagellar basal -body rod protein 


8.74644 


5.450488 


-2.71978 


0.029381 


-3.93912 


4,5lE-06 


Cj0528c 




Flagellar basal-body rod protein 


9.000553 


6.190284 


-2.60325 


0.037953 


-1.8492 


0.002595 




Intergenic, 


.1127982-1128192 


6.608378 


3.883081 


-2.5865 


0.07178 






CJ1650 


Cjl650 


Hypothetical protein 0)1650 


6.917701 


4,010168 


2.54657 


0.044624 


2.74245 


0,000383 


Cj 1726c 


melA 


Homoserine O-succinyltranaferase 


7.229019 


9.329885 


2.333074 


0.031976 


1.90992 


0.031452 


Cj089S 


Q0898 


Putative histidine triad (HIT) family protein 




12,29099 


2.380029 


0.02739S 


-0.26504 


0.229369 


Cj 1727c 


jrietB 


P u ta t iv< O-acclylhomoserine ( thiol ) - lyase 


6.91935 


8.951863 


2.43756 


0.019195 


1.441743 


0.058164 


Cj0667 


q<J667 


Putative S4 domain protein 


7.961339 


9.915982 


2.562837 


0.031976 


-0.09851 


0.815032 


Cj0454c 


Cj0454c 


Putative membrane protein 




5.898477 


2,585351 


0.031128 


0.317343 


0.246237 


CjC917c 


csfA 


Putative integral membrane protein (CstA homolog) 


6.15643 


8.510504 


2.59226 


0.007099 


0.857165 


0.171493 


Cj0912c 


cysM 


Cysteine synthase 


8,97862 


11,24426 


2,721892 


0.00485 


0.144 


0.441034 


Q0424 


Cj0424 


Putative acidic periplasmic protein 


6.721662 


10.1513 


3.667541 


6.94E-417 


0.215199 


0.316058 


< :j0423 


q0423 


Putative integral membrane protein 


4.5 61 62 1 


8.129305 


4.075151 


2J7E-06 


0.049055 


0.953024 


CJ0425 




Putative periplasm ic protein 


5.046527 


8.799093 


4 L71523 


3.43E-07 


-0.08205 


0.853347 
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CiaC, which is secreted by the C. jejuni flagellar type III 
secretion system and is potentially important in virulence 
(Christensen et al, 2009). Cjl650 is an orthologue of the 
Helicobacter pylori protein HP 1076, which has been 
identified as a flagellar export co-chaperone (Lam et al, 
2010). Two intergenic regions, located between tpx-napA 
(positions 731983-732128) and luxS^Q1199 (positions 
1127982-1128192), were also identified as significantly 
downregulated. The latter region was also identified in the 
screen for potential ncRNA genes, suggesting the possibility 
of a novel ncRNA involved in flagellar regulation. Genes 
identified as being upregulated in the rpoN mutant include 
two gene clusters: Q0423-Q0425, which is annotated as 
encoding a putative integral membrane protein and two 
putative periplasmic proteins; and metAB, which encode 
enzymes involved in amino acid metabolism and are 
located close to flgE2. An additional membrane protein 
gene, cstA, and another amino acid metabolism gene, cysM, 
are also upregulated in the rpoN mutant, together with 
Q0667, Q0898 and Q1650, all of which are of unknown 
function. Interestingly, Q0667 is the first gene in the 
operon containing rpoN. 

We compared differentially expressed genes identified in the 
RNA-seq data with those identified in a reanalysis of data 
derived from an earlier microarray-based study (Kamal etal, 
2007). Both studies compare wild-type C. jejuni NCTC1 1 168 
with an rpoN mutant, although it should be noted that 
different growth conditions were employed. (Kamal et al. 
used bacteria from a 24 h plate culture, resuspended in MH 
broth to OD 600 -0.1, and grown at 37 °C for 18 h with 
shaking at 75 r.p.m.) All but one of the genes identified as 
being significantly downregulated in the rpoN mutant by 
RNA-seq were also significantly downregulated in the 
microarray data (Fig. 6 and Supplementary Table S2). 
Conversely, for the upregulated genes, only one {met A) was 
significant in both datasets. The microarray data detected 
considerably more significantly downregulated genes than 
were identified using RNA-seq. This is in part due to the 
additional biological replicates available for the microarray 
data, which add statistical power. However, most of the 
genes in question show no evidence of downregulation in the 
RNA-seq data irrespective of P-value, which would be 
unexpected were they genuinely part of the rpoN regulon. It 
is possible that the assignment of these genes as differentially 
expressed is an artefact of the microarray hybridization or 
analysis procedure. Alternatively, the differences between the 
two datasets could reflect the different culture conditions 
employed in the two studies, although we consider this 
unlikely. As RNA-seq becomes increasingly commonplace, it 
will be possible to resolve this issue by characterizing the 
effect of growth conditions on the rpoN regulon. 



Concluding remarks 

In this study we have used direct high-throughput Illumina 
sequencing of cDNA (RNA-seq) to analyse the transcrip- 
tome of C. jejuni (NCTC11168). Our studies demonstrate 



2 ■ 



$4 




Microarray logFC 



Fig. 6. log 2 (fold change) values obtained using RNA-seq, plotted 
against the equivalent values from the microarray study (Kamal 
et al., 2007). Genes showing significant differential expression in 
the RNA-seq data (P<0.05) are highlighted in red, genes showing 
significant differential expression in the microarray data (P<0.05) 
are shown as triangles. (Red triangles, differential expression in the 
RNA-seq and microarray data; red circles, differential expression in 
the RNA-seq data; black triangles, differential expression in the 
microarray data; black circles, no differential expression in either 
data set.) 



the efficacy of high-throughput sequencing for defining 
mRNA expression levels and identifying differentially 
expressed genes and novel transcribed regions of the 
genome (for example, identification of potential unanno- 
tated ncRNA genes). We hope that the transcriptome and 
proteome datasets will be a useful resource to the 
Campylobacter research community. 
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